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1*  Introduction 

[2]  Differences  hetween  air  temperature  (Tair)  near  the 
sea  surface  (e.g.,  at  10  m  above  the  sea  surface)  and  sea 
surface  temperature  (Tsst)  have  important  implications  for 
climate  studies  over  the  global  ocean.  Oceans  exchange 
energy  with  the  atmosphere  via  evaporation  and  turbulent 
transfer  of  sensible  heat.  Tair-Tsst  is  an  important  control¬ 
ling  factor  in  these  exchanges  [AVciwj  and  Businger,  1994; 
Yu  et  ul. ,  2004],  Air  typically  either  gains  (loses)  heat  from 
(to)  the  ocean  depending  on  the  sign  of  Tair-Tsst  through 
the  sensible  heat  flux  [e.g.,  Cavan f  1992;  Fairall  et  ai, 
2003]. 

[1]  In  addition  to  the  sign,  the  magnitude  of  Tair-Tsst 
also  plays  a  major  role  in  maintaining  the  heating/cooling 
processes  over  the  ocean  surface  [e.g.,  Send  et  al,  1987; 
Soloviev  and  Lukas >  1997;  Soloviev  et  at.,  2001].  There¬ 
fore,  heat  budget  studies  tor  the  ocean  mixed  layer  and  the 
atmospheric  boundary  layer  above  the  sea  surface  require 
quantitative  analysis  of  Tair-Tsst  and  factors  affecting 

T Oceanography  Division,  Naval  Research  Liihomtory,  Stennis  Space 
Center.  Mississippi.  USA. 

'Department  of  Statistics,  University  of  Wisconsin.  Madison, 
Wisconsin,  USA. 

Copyright  2007  by  the  American  Geophysical  Union. 

0 1 4S-0227/U7/2006JC003  83  3*09.00 


this  difference.  Numerical  ocean  modeling  studies  [e.g., 
Murtugudde  et  ai,  2002;  Barron  et  aiy  2004;  Kara  et  al , 
2004]  generally  require  knowledge  of  Tair-Tsst  for  sta¬ 
bility  corrections  in  calculating  wind  stress,  sensible  and 
latent  heat  fluxes  [Kara  et  ait  2005]. 

[4]  An  examination  of  the  climatological  monthly  means 
of  Tair-Tsst  reveals  large  spatial  and  temporal  variations 
over  the  global  ocean  (Figure  1),  but  Tsst  is  typically 
wanner  than  Tair.  The  magnitude  of  Tair-Tsst  can  even 
be  <-3°C  along  the  Kuroshio  and  Gulf  Stream  pathways. 
Because  litis  temperature  difference  varies  regionally,  in  this 
paper  we  investigate  liow  different  atmospheric  variables 
affect  such  changes  in  Tair-Tsst. 

[5]  As  expected,  differences  between  Tair  and  Tsst  are 
closely  related  to  processes  at  the  air- sea  interface.  A 
typical  example  is  that  as  explained  in  Frankignoul 
[1985],  net  surface  heat  flux,  in  particular  a  combination 
of  latent  and  sensible  heat  fluxes  involving  vapor  mixing 
ratio  and  Tair-Tsst  values,  is  highly  correlated  with  Tair- 
Tsst,  but  weakly  correlated  with  Tsst  alone  over  most  of  the 
mid- latitudes.  This  simply  indicates  that  Tair-Tsst  cannot 
be  drived  from  Tair  or  Tsst  by  itself,  implying  the  existence 
of  other  variables  in  its  regulation. 

[ft]  Given  the  increasing  emphasis  placed  on  studying  the 
ocean's  role  in  climate  dynamics,  as  mentioned  above, 
understanding  the  relationship  between  Tair  and  Tsst  is 
essential.  Thus,  the  major  objective  of  this  paper  is  to 
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Figure  L  Climatological  mean  air- sea  surface  tempera¬ 
ture  differences  over  the  global  ocean  as  obtained  from 
(a)  ERA -40  and  (b)  COADS*  Construction  of  both  data  sets 
arc  described  in  section  2.  Tair-Tsst  values  at  high  latitudes 
(Arctic  and  Antarctic)  will  not  be  used  in  this  paper  due  to 
the  existence  of  ice, 

address  the  question,  “which  atmospheric  variable  has  the 
greatest  influence  on  Tair-Tsst?"  Possible  answers  to  this 
question  are  sought  on  climatological  timescales  by  analy¬ 
zing  data  from  two  global  data  sets  using  various  statistical 
approaches.  In  particular,  we  use  atmospheric  variables 
ai/near  the  sea  surface  (net  solar  radiation,  wind  speed,  vapor 
mixing  ratio  and  precipitation)  to  examine  the  importance 
order  of  each  one  in  driving  die  seasonal  cycle  of  Tair-Tsst. 

[?]  The  paper  is  divided  Into  six  sections.  First,  data  sets 
and  statistical  metrics  used  throughout  the  paper  are  des¬ 
cribed  (section  2),  followed  by  an  analysis  for  the  rela- 
lionship  between  Tair  and  Tsst  (section  3).  Next,  the 
influence  of  atmospheric  forcing  variables  on  Tair-Tsst 
is  investigated  over  the  global  ocean  on  climatological 
timescales  (section  4),  and  important  variables  that  are 
essential  in  driving  the  seasonal  cycle  of  Tair  Tsst  are 
investigated  by  building  a  regression  tree  model  that  can 
fit  piecewise  linear  models  (section  5).  Finally,  conclusions 
of  the  paper  are  provided  (section  6). 

2.  Data  and  Statistical  Metrics 

[a|  The  relationship  between  Tair  and  Tsst  (and  between 
Tair-Tsst  and  various  meteorological  variables)  is  investi¬ 
gated  using  global  monthly  mean  climatological  data  from 
two  sources:  (l)  1.125°  x  1.125°  European  Centre  for 
Medium -Range  Weather  Forecasts  (ECMWF)  40-year 


Re- Analysis  (ERA-40)  climatology  formed  over  the  years 
1957-2002,  and  (2)  1/2°  x  1/2°  Comprehensive  Ocean 
Atmosphere  Data  Set  (CO ADS)  climatology  formed  over 
1945  1989.  The  latter  is  the  new  COADS  climatology 
based  on  the  Atlas  of  Surface  Marine  Data,  Supplement  B 
(http ;// www,  node .  n  0  aa ,  go  v/OC  5  /a  s  mdn  c  w,  h  tml ) .  Details©  f 
the  archived  ERA-40  (a  numerical  model  product)  and 
observation- based  COADS  data  set  (constructed  mainly 
from  ship  observations)  can  be  found  in  Kdllberg  et  ttl. 
[2004]  and  da  Silva  et  ai  [1994],  respectively.  For  consis¬ 
tency,  climatological  monthly  means  of  Tair  and  Tsst  from 
ERA-40  and  COADS  are  interpolated  to  a  common  grid  of 
1-0°  x  1.0°  for  the  analyses, 

[9]  We  directly  obtain  monthly  mean  climatological 
fields  from  COADS,  while  in  the  case  of  ERA-40  wc 
construct  monthly  climatological  data  of  Tair  and  Tsst  based 
on  6  hourly  model  output  covering  the  period  1979-2002. 
The  data  set  from  the  ERA-40  project  covering  ihe  full 
analysts  period  (1957-2002)  is  not  used  because  earlier 
time  periods  did  not  include  many  observational  and  satel¬ 
lite-based  data  sets  in  the  re-analysis*  Note  that  Tair  from 
ERA40  is  at  2  m,  while  that  from  COADS  is  at  10  m  above 
the  sea  surface, 

[10]  Monthly  mean  climatologies  of  Tair-Tsst  reveal 
existence  of  a  strong  seasonal  cycle  in  many  regions  as 
evident  from  both  data  sets  (Figure  2).  For  example,  Tair  is 
as  much  as  5°C  colder  than  Tsst  along  the  western  boun¬ 
daries  of  ocean  basin  during  February  and  November,  and 
Tair-Tsst  can  even  be  positive  (>G°C)  in  May  and  August. 
Substantial  seasonal  variability  is  also  evident  over  most  of 
the  Indian,  Atlantic  and  Pacific  Oceans.  Globally,  monthly 
mean  Tair-Tsst  from  both  data  sets  agree  with  each  other 
reasonably  well  within  0J°C  (Table  1),  indicating  their 
consistency  for  use  in  this  study. 

[n]  Given  the  large  variability  in  Tair-Tsst,  we  will  first 
investigate  the  relationship  between  Tair  and  Tsst*  This  is 
done  to  determine  the  source  of  difference  between  the  two. 
Time  series  of  Tair  and  Tsst  at  each  ocean  grid  point  from 
ERA-40  and  COADS  are  compared  using  various  statistical 
metrics:  mean  difference  (MD),  root-mean -square  (RMS) 
difference,  correlation  coefficient  (/?)  and  non-dimensional 
skill  score  (SS),  Let  X{  (i  =  1,  2/  ■  ■,  n)  be  the  set  of  n  Tsst 
(reference)  values,  and  let  (1  -  1,  2,-  *  ■,  /;)  be  the  set  of 
n  Tair  values*  Also  let  X  (K)  and  ax  (ar)  be  the  means  and 
standard  deviations  of  Tsst  (Tair)  values,  respectively. 

[iz]  Following  Wilks  [1995],  the  statistical  metrics  used 
throughout  the  paper  are  as  follows: 
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Figure  2.  Climatological  monthly  mean  air -sea  surface  temperature  over  the  global  ocean  in  February, 
May,  August  and  November  They  are  obtained  from  two  data  sets;  (a)  ERA-40  and  (b)  COADS,  The 
regions  where  ice  exists  are  shown  in  gray.  An  ice  land  mask  is  used  to  determine  the  ice -free  regions 


over  the  global  ocean  as  explained  in  the  text. 

where  n  is  equal  to  12  (January  through  December)  at  each 
point  of  a  1°  grid  over  the  global  ocean.  In  particular,  MD  is 
the  annual  mean  of  Tair- Tsst.  RMS  can  be  considered  as  an 
absolute  measure  of  the  distance  between  the  Tair  and  Tsst 
time  series.  The  R  value  is  a  measure  of  the  degree  of  linear 
association  between  the  two  variables. 


[u]  SS  in  equation  (4)  is  the  fraction  of  variance  in  Tair 
explained  by  Tsst  minus  two  non-dimensional  biases 
(conditional  bias,  Bcnn dt  and  unconditional  bias,  Bum^) 
which  are  not  taken  into  account  in  the  correlation 
coefficient.  5u?li.onj  (also  called  systematic  bias)  is  a  non- 
dimensional  measure  of  the  difference  between  the  mean 
values  of  the  Tair  and  Tsst  time  series,  and  BC{m$  is  a 
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Table  1.  Global  Average  and  Standard  Deviations  of  Climatolo¬ 
gical  Mean  Tair-Tsst* 


Month 

Global  Mean,  nC 

Standard  Dev.,  nC 

ERA  -40 

COADS 

ERA-40 

COADS 

Jim 

-1.01 

-0.69 

1.05 

l.n6 

Feb 

—0,97 

-0.67 

0,94 

0,97 

Mar 

—0.95 

-0.65 

0.67 

0,68 

Apr 

-0,91 

-0,60 

0J2 

0.58 

May 

-0,89 

-0.61 

0.58 

0.69 

Jun 

-0,89 

-0.66 

0.68 

0,82 

Jul 

-0.90 

-0.64 

0.70 

0.71 

Aug 

-0.S9 

-0.58 

0.59 

0,73 

Sep 

-0.93 

-0.58 

0.46 

0.60 

Oct 

-0.95 

-0.59 

0.50 

0,68 

Nov 

-0.99 

-0.61 

0.72 

0,82 

Dec 

-1.01 

-0.68 

0.97 

0,99 

All 

—0.94 

“0.63 

0,70 

0,78 

‘Mean  and  standard  deviation  values  are  calculated  only  in  icc  -  Tree 
regions  over  Ihe  global  ocean.  The  last  row  (All)  denotes  values  calculated 
over  the  seaonal  cycle. 


measure  of  the  relative  amplitude  of  die  variability  in  die 
two  data  sets.  An  examination  of  die  SS  formulation 
reveals  that  is  equal  to  SS  only  when  i#tond  and  ^tmeond 
□re  zero.  Because  these  two  biases  are  never  negative, 
ihe  R  value  can  be  considered  to  be  a  measure  of  the 
“potential"  skill  in  using  Tsst  to  estimate  Tair,  i.e.,  the 
skill  that  one  can  obtain  when  there  are  no  differences 
between  Tair  and  Tsst.  A  SS  value  of  1.0  indicates  that 
Tair  and  Tsst  arc  identical,  and  SS  can  be  negative  if  there 
is  no  skill  between  Tair  and  Tsst 

3*  Statistical  Relationship  Between  Fair  and  Tsst 

[u]  Comparisons  between  Tair  and  Tsst  (Figure  3)  are 
performed  using  the  data  sets  (ERA-40  and  COADS)  and 
statistical  metrics,  both  of  which  are  already  described  in 
section  2+  Regions  where  ice  is  present  (e.g.,  high  latitudes) 
□re  masked  and  shown  in  gray.  The  ice- free  regions  over 
the  global  ocean  arc  determined  from  an  ice  land  mask 
[Reynolds-  et  ai ,  2002).  The  ice  land  mask  is  a  function  of 
the  ice  analysis  and  may  change  periodically.  For  this 
reason,  a  climatological  mean  of  maximum  ice  extent  for 
the  mask  is  used  in  all  calculations. 

[is]  Ignoring  high  latitudes  where  sea -ice  forms,  the 
mean  difference  fields  are  broadly  similar  to  each  other 
with  Tsst  wanner  than  Tair  (generally  by  <1°C)  nearly 
everywhere  over  the  global  ocean,  Tsst  is  warmer  because 
solar  radiation  is  absorbed  more  efficiently  by  the  ocean 
(and  land)  than  by  the  atmosphere  (he.,  troposphere  is 
heated  from  below).  In  addition,  warm  Tsst  relative  to  the 
subsurface  usually  gives  stable  stratification,  while  ihe 
situation  is  opposite  for  ihe  atmosphere.  Having  Tsst 
warmer  than  Tair  simply  explains  that  average  sensible 
heat  flux  is  almost  always  cooling  (warming)  the  ocean 
(atmosphere)  on  climatological  timescales. 

[ift]  A  striking  feature  of  Figure  3  is  that  relatively  large 
Tair-Tsst  values  (even  as  large  as  -5°C)  do  exist  in  mid- 
latitudes  along  the  western  boundaries  (Kuroshio  and  Gulf 
Stream  pathways),  where  the  RMS  difference  between  the 
two  is  generally  >3°C  Atmospheric  advection  of  Tair  and 
oceanic  advection  of  Tsst  play  an  important  role  in 


determining  Tair-Tsst  in  these  regions  [Yasuda  et  ai ., 
2000;  Qu  et  al ,,  2004;  Dong  and  Kelly,  2004].  Overall, 
the  results  from  ERA-40  and  COADS  arc  similar  except 
for  differences  in  some  regions,  such  as  the  south  western 
Pacific,  including  some  regions  of  the  Indian  Ocean  and 
high  southern  latitudes.  Such  discrepancies  are  generally 
seen  from  maps  of  RMS,  SS  and  Bwn&  Not  surprisingly, 
the  discrepancies  tend  to  occur  in  regions  of  sparse 
observations  especially  at  high  latitudes. 

[i7]  There  is  a  close  relationship  (large  R)  between  Tair 
and  Tsst  over  the  annual  cycle  in  most  regions  (Figure  3). 
However,  there  is  almost  no  skill  between  the  two  in  three 
major  regions:  (i)  most  of  the  tropics,  extending  even  to 
mid- latitudes  in  some  places,  (it)  along  the  western 
boundaries,  and  (hi)  at  high  sou  them  and  North  Atlantic 
latitudes.  The  low  skill  in  all  three  regions  Is  due  mainly 
to  large  differences  in  the  mean  Tair  and  Tsst  values 
(i.e.,  large  B UJiet)n<j  over  the  seasonal  cycle).  For  COADS, 
#cond  and  low  or  even  negative  R  values  play  a  substantial 
role  in  giving  low  or  negative  skill  in  the  western  equato¬ 
rial  Pacific  warm  pool  and  at  high  southern  latitudes. 
Overall,  R  values  arc  generally  very  high  (>0.9)  over  most 
of  the  global  ocean.  However,  further  analysis  reveals  that 
when  Tair  >27°C  (i.e,,  regions  around  the  equator)  and  Tair 
<5°C  (i*e.,  high  southern  latitudes),  R  values  are  typically 
<0.5,  especially  tor  COADS* 

[is]  Based  on  the  zonal  ly  averaged  statistical  metrics 
between  Tair  and  Tsst  (Figure  4),  it  is  further  conlirmed 
that  ERA-40  and  COADS  give  similar  results  over  most  of 
the  global  ocean.  However,  noticeable  differences  do  show 
up  in  SS  values  south  of  40°S.  As  mentioned  earlier,  this  is 
due  partly  to  the  fact  that  the  seasonal  cycle  of  Tair  and  Tsst 
(i.e.,  R)  is  quite  different  between  the  two  data  sets  at  those 
latitude  bands.  The  combination  of  different  R  and  BLond 
values  in  the  two  daia  sets  results  in  the  large  differences  in 
skill  score  south  of  40°S. 

[i^]  We  also  present  scatter  diagrams  for  Tair  versus 
Tsst,  Tair  versus  Tair-Tsst,  and  Tsst  versus  Tair-Tsst 
(Figure  5).  This  is  done  to  further  examine  the  relation¬ 
ship  between  Tair  and  Tsst  and  decide  which  one  (Tair  or 
Tsst)  controls  Tair-Tsst.  There  is  a  strong  linear  relation¬ 
ship  between  Tair  and  Tsst  with  a  R  value  >0.99  for  both 
ERA-40  and  COADS  over  die  global  ocean.  Unlike  Tair 
versus  Tsst,  there  is  no  linear  relationship  between  Tair 
(or  Tsst)  and  Tair-Tsst  (R  &  0),  suggesting  that  neither 
Tair  nor  Tsst  modulates  Tair-Tsst  over  the  global  ocean. 
This  simply  indicates  that,  as  expected,  there  must  be 
other  factors  that  control  Tair-Tsst,  at  least  in  some 
regions  of  the  global  ocean. 

4.  Effects  of  Atmospheric  Variables  on  Tair-Tsst 

[20]  The  results  in  the  preceding  section  demonstrate 
that  die  climatological  mean  of  Tair-Tsst  must  be  con¬ 
trolled  by  variables  other  than  Tair  or  Tsst  itself.  Thus,  our 
focus  here  is  to  examine  die  possible  effects  of  near¬ 
surface  atmospheric  variables  in  driving  the  seasonal  cycle 
of  Tair-Tsst.  We  consider  several  scalar  atmospheric 
variables:  wind  speed  at  10  m  above  the  sea  surface,  air 
mixing  ratio  at  10  m  above  the  sea  surface,  net  radiation 
(the  total  of  net  shortwave  and  net  longwave  radiation)  ai 
the  sea  surface,  Tair,  Tsst  and  precipitation  at  the  sea 
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(a)  ERA-40:  Tair  vs  Tsst 


(b)  GOADS:  Tair  vs  Tsst 


Figure  3*  Spatial  maps  of  statistical  metrics  (see  section  2)  calculated  between  climatological  monthly 
means  of  Tair  and  Tsst  over  the  global  ocean.  In  the  maps,  except  for  the  mean  difference,  white  (red)  is 
intended  to  represent  a  tendency  for  good  (poor)  relationship  between  the  two  variables. 
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Figure  4.  Zonal  averages  of  statistical  metrics  shown  in  Figure  3.  Zonal  averaging  was  performed  at 
each  1°  latitude  belt  over  the  global  ocean. 


surface.  These  variables  arc  specifically  chosen  because 
they  are  typically  used  as  atmospheric  forcing  for  coupled 
ocean  -  atmosphere  and  ocean  general  circulations  models 
[e.g.*  Ha  id vo gel  and  Bryan,  1992],  Monthly  mean  of  these 
variables  obtained  from  the  COADS  and  ERA-40  data  sets 
are  interpolated  to  a  common  1,0°  x  LOd  global  grid 
[zil  Linear  correlation  coefficients  between  Tair-Tsst 
and  atmospheric  forcing  variables  mentioned  above  are 
calculated  at  each  1.0°  x  1,0°  grid  box.  They  are  then 
mapped  over  the  global  ocean  (Figure  6).  Correlation  values 
for  the  seasonal  cycle  based  on  equation  (3)  in  section  2 
reveal  a  strong  (or  weak)  positive  (or  negative)  relationship 
between  Tair-Tsst  and  other  variables.  There  is  a  strong  and 
positive  relationship  between  Tair-Tsst  and  net  solar  radi¬ 
ation  at  the  sea  surface*  especially  from  the  mid-  to  high 
latitudes.  Note  that  one  must  have  at  least  an  R  value  of 
±0.53  for  it  to  be  statistically  different  from  a  zero  correla¬ 
tion  ( R  -  0)  at  a  95%  confidence  level  based  on  the 
12  monthly  values  at  each  grid  point  over  the  global  ocean. 


[22  ]  W  h  i  le  th  ere  a  re  re  lati  v  e  ly  large  d  i  ffere  nees  i  n  R  v  al  ues 
from  ERA-40  and  COADS  in  some  regions  (e.g.T  southern 
hemisphere),  live  fields  are  broadly  similar  over  most  of  the 
global  ocean  (Figure  6).  In  some  regions*  R  values  are  high 
even  when  there  are  noticeable  differences  in  the  magni¬ 
tudes  of  atmospheric  variables  from  the  two  data  sets.  This 
is  due  to  the  fact  that  the  shape  and  phase  of  the  seasonal 
cycles  for  all  of  the  atmospheric  variables*  except  precipi¬ 
tation*  are  very  similar  illustrated  for  the  latitude  belt  at 
30°N  in  (Figure  7). 

[n]  The  lowest  and  statistically  insignificant  R  values 
are  generally  found  in  the  equatorial  regions.  These 
statistically  insignificant  R  values  in  comparison  to  those 
at  other  latitudes  are  also  evident  from  the  zonally- 
averaged  correlation  values  (Figure  8),  This  is  true 
regardless  of  the  atmospheric  variable  correlated  with 
Tair-Tsst,  Therefore*  it  appears  that  none  of  the  atmo¬ 
spheric  variables  modulate  Tair  Tsst  in  that  region.  On 
the  contrary*  Tair-Tsst  is  generally  driven  by  combination 
of  all  atmospheric  variables  which  may  be  linearly 
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Figure  5*  Scalterplots  for  Tair  versus  Tsst,  Tair  versus 
Tair-Tsst  and  Tsst  versus  Tair- Tsst  based  on  annual  mean 
values  at  each  1°  bin  over  the  global  ocean.  A  linear 
regression  model  Tor  Tair-Tsst,  fitted  to  the  31671  points, 
gives  constant  slope  of  TO  for  both  data  sets.  Note  that  the 
large  spread  between  Tair  and  Tsst  is  mostly  due  to  the 
values  along  the  western  boundary  currents.  Linear 
regression  equations  arc  Tsst  =  0.93  +  1 .00  Tair  for  ERA-40 
and  Tsst  =  0.63  +  TOO  Tair  for  COADS. 


dependent  themselves  along  the  western  boundaries  in¬ 
cluding  die  Gulf  Stream  and  Kuroshio  current  systems  as 
evident  from  the  significantly  large  R  values  close  to  1. 
Wind  speed  at  10  m  above  the  sea  surface  and  net  solar 
radiation  at  the  sea  surface  are  die  two  main  variables 
tracked  by  Tair-Tsst  at  the  subtropical  northern  and 
southern  latitudes. 

[24]  An  interesting  feature  that  is  evident  from  Figure  6  is 
that  unlike  other  variables,  wind  speed  and  precipitation 
have  strong  negative  correlations  with  Tair-Tsst  over  most 
of  the  global  ocean.  Figure  9  shows  the  cumulative  fre¬ 
quency  of  correlations  between  Tair-Tsst  and  all  variables. 
As  mentioned  previously,  R  is  calculated  over  the  seasonal 


cycle.  Overall,  38%  (28%)  of  R  values  between  Tair-Tsst 
and  wind  speed  are  <—0.6  for  ERA-40  (COADS),  and 
similarly  36%  (28%)  of  R  values  between  Tair-Tsst  and 
precipitation  are  <-0.6  for  ERA^40  (COADS)  over  the 
global  ocean  (Table  2). 

[25]  Median  values  are  calculated  to  obtain  a  quantitative 
analysis  of  correlations  between  Tair-Tsst  and  other  varia¬ 
bles  over  the  global  ocean.  All  R  values  are  ordered  from 
lowest  to  highest  value,  and  the  middle  value  corresponding 
to  50%  is  picked  to  find  the  median  cone  la  t  ion  for  each 
case.  Median  R  using  data  from  ERA-40  (COADS)  is  the 
highest  with  a  value  of  0.84  (0,85)  when  Tair-Tsst  is 
correlated  to  net  solar  radiation  at  the  sea  surface  (Table  3). 
Since  the  median  R  values  are  not  that  large  between  Tair- 
Tsst  and  all  other  variables,  net  solar  radiation  at  the  sea 
surface  plays  an  important  role  in  maintaining  the  seasonal 
cycle  of  Tair-Tsst  over  the  global  ocean.  Regardless  of 
which  data  set  (ERA-40  or  COADS)  is  used  for  calculating 
/?,  the  median  values  are  generally  very  close  to  each  other, 
except  for  wind  speed,  and  global  averages  of  R  are  almost 
same  (Table  3).  This  confirms  the  robustness  and  consis¬ 
tency  of  the  results. 

[26]  Using  R  values  presented  in  Figure  6,  we  calculate 
the  overall  percentage  of  variance  explained  by  each 
atmospheric  variable.  This  is  done  to  quantitatively  iden¬ 
tify  the  strongest  of  these  predictors.  Square  of  correlation 
values  (i.e.,  R2)  between  each  atmospheric  variable  and 
Tair-Tsst  are  first  obtained  at  each  grid  point.  The 
maximum  of  them  is  then  determined.  This  process  is 
repeated  at  each  grid  point  over  the  global  ocean,  yielding 
a  map  of  the  most  important  variables  (Figure  10). 
Overall,  most  of  the  variance  in  the  seasonal  cycle  of 
Tair-Tsst  is  again  explained  by  die  net  solar  radiation  at 
the  sea  surface  for  50.8%  (57.0%)  of  the  global  ocean, 
when  ERA-40  (COADS)  data  are  used.  There  are  some 
regional  differences  though.  For  example,  wind  speed  is 
the  most  effective  of  the  variables  over  18.7%  of  the 
global  ocean  when  using  ERA^JO,  while  the  percentage 
amount  drops  to  less  than  half  that  value  (9.1%)  for  the 
COADS  data  set.  Other  regional  differences  exist  in  both 
data  sets,  especially  at  high  southern  latitudes  where  both 
data  sets  suffer  from  a  lack  of  quality  observational  data. 

5,  What  Controls  the  Climatological  Mean  of 
Tair-Tsst? 

[27]  As  explained  in  section  4,  in  addition  to  net  solar 
radiation  at  the  sea  surface,  Tair-Tsst  is  strongly  correlated 
with  wind  speed  and  air  mixing  ratio  at  10  m,  Tair  and  Tsst 
depending  on  the  region  of  die  global  ocean  (Figure  8).  in 
most  cases,  there  is  more  than  one  variable  that  has  a  direct 
effect  (or  a  linearly  dependent  effect)  on  Tair  Tsst  because 
the  linear  correlation  values  between  Tair-Tsst  and  two  or 
more  variables  can  be  quite  high  for  a  given  grid  point  over 
most  of  the  global  ocean  (see  Figure  6). 

[28]  Given  the  results  in  section  4,  two  main  questions 
arise:  (1)  which  atmospheric  forcing  variable  is  the  most 
important  one  driving  the  seasonal  cycle  of  Tair-Tsst  over 
the  global  ocean?,  and  (2)  what  is  the  importance  order  of 
each  variable  in  affecting  Tair-Tsst?  To  answer  these 
questions,  we  will  build  a  prediction  algorithm  for  Tair- 


7  of  18 


C05020 


KARA  ETAL.:  AIR -SEA  TEMPERATURE  DIFFERENCES 


CQ5020 


(a)  ERA  40 r  Tair-Tsst  correlation  (h)  GOADS:  Tair-Tsst  correlation 


Figure  6.  Correlation  coefficients  between  Tair-Tsst  and  atmospheric  variables  calculated  over  the 
seasonal  cycle.  Positive  (negative)  correlations  are  in  red  (blue). 
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Figure  7*  Climatological  monthly  mean  time  series  for  atmospheric  forcing  variables  averaged  over 
the  latitude  belt  of  BO^N  from  top  to  bottom;  wind  speed  at  10  m  above  the  sea  surface,  net  solar 
radiation  at  the  sea  surface,  vapor  mixing  ratio  at  10  m  above  the  sea  surface,  Tair,  Tsst,  and 
precipitation  (xlO9)  at  the  sea  surface.  Results  for  ERA-40  (open  circles)  and  COADS  (filled  squares) 
are  shown  separately. 


Tsst  (section  5.1),  and  discuss  importance  order  for  each 
atmospheric  variable  {section  5.2). 

[29]  The  six  variables  discussed  in  section  4  are  consi¬ 
dered  as  potential  predictors  in  this  proposed  prediction 
algorithm.  The  methodology  should  have  several  characte¬ 
ristics  that  ensure  its  usefulness  and  validity.  Among  the 
most  important  of  these  considerations  is  that  the  metho¬ 


dology  allow  for  statistical  significance  testing  by  way  of 
cross  validation  and  nonlinear  relationships  between  pre¬ 
dictors  and  Tair-Tsst  The  methodology  also  needs  to 
provide  useful  and  interpretable  results.  Methods  such  as 
linear  programming  do  not  allow  for  validation  of  the 
results*  while  purely  statistical  methods,  such  as  regression 
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Figure  8.  Zonal  averages  of  correlation  coefficients  shown  in  Figure  6.  Zonal  averaging  was  performed 
at  each  1°  latitude  belt  over  the  global  ocean. 


and  discriminant  analysis  do  not  easily  allow  for  nonlinear 
relationships  [Breiman  et  at. ,  1984]. 

5.1.  Prediction  Methodology 

[m]  The  importance  order  of  atmospheric  variables  in 
driving  the  seasonal  cycle  of  Tair-Tsst  is  examined  using 
Generalized,  Unbiased,  Interaction  Detection  and  Estima¬ 
tion  (GUIDE),  which  is  a  regression  tree  model  [Loh,  2002]. 
A  brief  description  of  GUIDE  is  given  in  Appendix  A.  The 
goal  of  a  regression  tree  is  to  predict  or  explain  the  effect  of 


one  or  more  variables  on  a  dependent  variable.  GUIDE  can 
fu  piecewise  linear  models  for  Tair-Tsst  based  on  the 
atmospheric  variables.  In  essence,  a  regression  tree  is  a 
piecewise  constant  or  piecewise  linear  estimate  of  a  regres¬ 
sion  function,  constructed  by  recursively  partitioning  the 
data  set  and  sample  space. 

[jmJ  In  the  GUIDE  analysis  as  applied  to  this  investi¬ 
gation,  wc  use  climatological  monthly  means  of  the 
dependent  variable  (Tair-Tsst)  and  six  predictors  (wind 
speed  at  10  m  above  the  sea  surface,  net  solar  radiation  at 
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Correlation  coefficient 


Figure  9.  Cumulative  percentage  of  correlation  coefficients  between  Tair-Tsst  and  other  atmospheric 
variables  using  monthly  mean  climatological  data  from  ERA -40  and  COADS.  The  median  value  is  the 
point  intersecting  the  50%  line. 


the  sea  surface,  air  mixing  ratio  at  10  m,  Tair,  Tsst  and 
precipitation).  These  mean  values  are  extracted  from 
ERA-40  and  COADS  at  1°  x  1°  grid  boxes  over  the 
global  ocean.  This  is  done  for  each  month  separately.  Wc 
also  combine  all  the  mean  monthly  data  to  examine 


atmospheric  variables  which  control  the  seasonal  cycle 
of  Tair-Tsst 

[32]  The  values  in  each  1 0  grid  bin  are  just  the  sum  of  the 
values  al  grid  points  in  the  bin  divided  by  the  number  of 
such  grid  points.  Thus,  there  is  no  areal  averaging*  In  other 
words,  the  assumption  is  that  individual  bins  are  small 
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Tsihlc  2.  Percentages  of  Correlation  Coefficients  Shown  in  Figure  6:‘ 
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3.5 

COADS 

4.0 

4.3 

6,1 

6,1 

9,3 

3.6 

0.3  <  R  <  0.4 

ERA  40 

2.3 

2.8 

5,1 

6,5 

8,2 

3.4 

CUADS 

3,9 

4,9 

7.5 

S,4 

10.6 

3,1 

0  4  <  R<  0.5 

ERA  -  40 

2.6 

3.5 

6,8 

8.3 

10.0 

3.7 

COADS 

3.5 

5.2 

10,3 

11.2 

10.8 

2.2 

0.5  <  R  <  0.6 

ERA -40 

3.0 

4,5 

8.4 

8,9 

10.9 

4.2 

COADS 

3,3 

5.4 

13.0 

14.2 

8.5 

1.5 

0,6  <  R  <  0.7 

ERA- 40 

3.6 

6,3 

10,9 

10.3 

10.2 

4.4 

COADS 

2,9 

59 

13.7 

13,6 

6.1 

1.0 

0.7  <R<  0.8 

ERA  40 

4.4 

9,8 

14.2 

11.9 

8.9 

3.7 

COADS 

2.8 

7.7 

10,1 

10,K 

4.2 

0.5 

0.8  <  R  <  0.9 

ERA -40 

5.7 

2O.0 

15.8 

J  3.2 

4.2 

2,5 

COADS 

2.1 

15,5 

6.8 

79 

1,2 

0.2 

0,9  <  R<  1.0 

ERA -40 

3.8 

35,8 

8,0 

5,4 

l.O 

0.6 

COADS 

0.4 

36.1 

1,9 

2.7 

0.3 

0.1 

“Correlation  coefficients  are  belween  Tair-Tsst  and  oilier  atmospheric  variables.  The  class  intervals  are  0,1  wide  and  range  from  -  1 ,0 
Ihrnugh  I  .0.  The  highest  percentage  value  is  printed  in  boldface. 


enough  that  areal  weighting  is  not  needed  within  the  bin. 
However,  the  width  (in  longitude)  of  Ihe  bins  is  adjusted  so 
that  the  size  of  the  bin  in  m2  is  approximately  constant. 
Thus,  at  the  equator  each  1°  bin  is  1°  by  1°,  but  starting  at 
60°  it  becomes  1°  in  latitude  by  3°  in  longitude.  At  the  pole, 
essentially  all  the  grid  points  would  he  one  big  bin  (in 
longitude).  Since  we  masked  values  (i.e.,  did  not  use  them) 
at  very  high  latitudes  due  to  tee,  weighting  the  data  points 
by  areal  coverage,  i,c.t  having  larger  (smaller)  values  at  the 
equator  (near  the  poles)  is  not  a  concern  in  this  study. 

[j.i]  An  example  of  how  GUIDE  proceeds  is  illustrated  in 
Figure  11.  ft  shows  a  regression  tree  obtained  using  data 
from  ERA-40  in  January,  The  tree  is  built  on  x  and  yt  where 
y  is  the  variable  Tair-Tsst,  and  the  predictor  variables  are 
x  =  (rad fix,  vapmix,  wndspd,  prerip).  The  abbreviations  are 
defined  in  the  figure  caption.  In  this  particular  example,  the 
regression  tree  first  partitions  the  results  based  on  vapmix. 
Observations  with  vapmix  <0.0048  kg  kg  1  (4.8  g  kg  3) 
go  to  the  left  branch  and  otherwise  to  the  right  branch.  A 
least  squares  function  linear  in  the  four  predictor  variables  is 
fitted  to  the  data  in  each  leaf  node  of  the  tree.  The  average 
value  of  Tair-Tsst  in  each  leaf  node  is  in  italics  (Figure  1 1). 


The  tact  that  the  GUIDE  tree  first  splits  on  vapmix  (i.c,, 
vapor  mixing  ralio  at  the  sea  surface)  implies  that  the  latter 
has  the  largest  nonlinear  effect  on  the  seasonal  cycle  of 
Tair-Tsst  in  January.  If  vapmix  is  <4,8  g  kg  \  the  variable 
with  the  next  largest  nonlinear  effect  is  wndspd  (near- 
surface  wind  speed).  On  lire  other  hand,  when  vapmix  is 


Table  3,  Mean  and  Median  Correlation  Coefficients  Over  the 
Global  Ocean® 

Variable 

Global  Mean 

Global  Median 

ERA  -40 

COADS 

ERA -40 

COADS 

Wind  speed 

0-27 

-0.26 

—0.48 

—0.29 

Solar  nid. 

0.65 

0,64 

0.84 

0.85 

Mixing  ratio 

0.40 

033 

0.58 

0.44 

Tair 

0.34 

0.38 

0.51 

0.50 

Tsst 

0.I7 

0.10 

0.38 

0.21 

Precipitation 

-0.2B 

-0.37 

-0.41 

“0.38 

^Correlation  values  are  obtained  from  a  least  squares  analysis  wilh  Tair- 
Tsst  as  Ihe  dependent  variable  and  six  predictor  variables,  calculated  in 
ice- free  regions  over  the  global  ocean, 
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[b)  GOADS:  Maximum  variance 


each  variable 


(a)  RRA-410: 


each  variable 


Figure  10*  Regions  showing  which  atmospheric  variable 
explains  the  largest  percentage  of  the  variance  in  Tair-Tsst 
over  the  global  ocean.  They  are  determined  from  correlation 
values  shown  in  Figure  6,  as  explained  in  the  text  Areal 
percentage  over  the  global  ocean  where  the  maximum 
variance  is  explained  by  each  atmospheric  variable  (wind 
speed,  net  solar  radiation,  vapor  mixing  ratio,  Tair,  Tsst  and 
precipitation)  is  as  follows:  18.7  (9.1),  50.8  (57.0),  12.2 
(3,8),  3*2  (9.3),  6.5  (10*1),  10*7%  (8*6%)  for  the  ERA-40 
(COADS)  data  set,  respectively* 


>4.8  g  kg  radflx  (net  solar  radiation)  has  the  next  largest 
effect, 

[34]  The  estimation  procedure  in  Figure  1 1  begins  by 
creating  an  initial  decision  node  and  then  adding  further 
nodes  as  constrained  by  the  tree  growth  parameters. 
Because  it  is  always  possible  to  obtain  zero  apparent 
prediction  error  by  partitioning  the  predictor  space  so 
finely  that  each  node  contains  just  enough  observations 
to  lit  a  multiple  linear  model  perfectly,  a  criterion  is 
necessary  to  determine  the  optimal  tree  size.  This  is 
achieved  by  first  constructing  an  overly  large  tree  and 
then  pruning  it  to  maximize  a  cross  -  validated  estimate  of 
expected  square  prediction  error.  Here  pruning  refers  to  an 
objective  tree  selection  procedure  that  finds  the  subtree 
having  the  best  estimated  predictive  accuracy. 

[35]  The  regression  tree  in  Figure  11  is  obtained  after 
pruning.  It  is  constructed  recursively  as  follows.  At  each 
node  of  the  tree,  a  multiple  linear  regression  model  is 
fitted  to  the  data  there.  For  each  observation  v,  the  model 
gives  a  predicted  value  y*  Define  the  "residual1*  associated 
with  v  by  y  -  v.  If  the  model  fits  the  data  satisfactorily,  a 
plot  of  the  residuals  versus  any  predictor  variable  should 
look  like  random  noise. 


[36]  Hie  tree  in  Figure  1 1  also  provides  empirical  evi¬ 
dence  for  nonlinearity  between  between  Tair-Tsst  and 
atmospheric  variables.  The  GUIDE  algorithm  fits  a  multiple 
linear  regression  to  the  data  in  each  node  of  the  tree. 
Therefore  all  predictor  variables  are  treated  equally,  indi¬ 
cating  that  there  is  110  leading  variable.  If  there  is  no 
nonlinearity,  a  single  multiple  linear  model  would  be 
sufficient.  This  would  give  a  tree  with  a  single  terminal 
node  (i,e*,  no  splits).  The  fact  the  the  tree  has  so  many 
branches  indicates  that  a  multiple  linear  model  is  inade¬ 
quate.  Each  time  the  algorithm  detects  nonlinearity,  it  would 
split  the  data  into  two  subsets  and  try  to  fit  a  linear  model 
separately  to  each  subset.  If  GUIDE  is  forced  to  use  a  single 
predictor  to  fit  each  node  (i.e.,  simple  linear  model),  the  tree 
gets  bigger.  Not  only  arc  the  slopes  different  in  each  node 
(or  segment),  but  even  the  selected  predictors  are  different, 

[37]  Figure  12  shows  the  plots  of  residuals  for  each 
predictor  variable  for  the  observations  in  the  lop  node  of 
the  tree.  Vapor  mixing  ratio  has  the  most  significant 
curvature  test  in  terms  of  the  chi  -  square  lest.  Therefore,  it 
is  selected  to  split  the  top  node  in  the  regression  tree. 

5*2,  Importance  of  Atmospheric  Forcing  Variables 

[3s]  The  question  of  "which  variable  is  the  most  impor¬ 
tant  for  determining  the  magnitude  of  Tair -Tsst"  can  be 
posed  in  two  ways:  (a)  which  variable  has  the  smallest 
prediction  error  if  each  variable  is  used  singly  to  predict 
Tair- Tsst?,  and  (b)  which  variable  whose  absence  from  a 
model  using  all  the  predictors  produces  the  largest  increase 
in  prediction  error?  In  (a)  Tair-Tsst  is  considered  to  be  a 
function  of  only  one  variable  (e.g.,  wind  speed,  net  solar 
radiation,  etc.,  separately),  while  in  (b)  the  dependence  of 
all  variables  except  one  on  Tair-Tsst  is  taken  into  account* 
In  both  cases,  we  use  GUIDE  to  fit  the  necessary  models 
and  compare  its  cross-validation  estimates  of  prediction 
mean  square  errors  to  rank  the  variables  in  order  of 
importance  with  respect  to  their  effect  on  Tair-Tsst.  Con¬ 
fidence  intervals  for  the  estimated  errors  arc  used  to 
determine  the  statistical  significance  of  the  ranks* 

[39]  Wc  first  examine  the  importance  of  each  variable 
when  it  is  used  singly  in  predicting  Tair-Tsst  over  the 
seasonal  cycle  on  climatological  timescales.  This  is  done  by 
combining  all  monthly  mean  data  (from  January  through 
December)  lor  ERA^O  and  COADS,  separately.  The  anal¬ 
yses  for  individual  months  arc  reported  later. 

[40]  Table  4  gives  the  cross-validation  estimates  of 
prediction  mean  squared  error  for  each  variable  when  it  is 
used  singly  to  predict  Tair-Tsst  as  mentioned  in  (a)*  Net 
solar  radiation  at  the  sea  surface  is  the  most  important 
predictor  for  Tair  Tsst,  because  it  yields  the  smallest 
prediction  mean  squared  error  estimate  of  0.40  for  ERA-40 
and  0,37  for  COADS.  It  is  followed  by  vapor  mixing  ratio, 
Tair,  Tssl,  precipitation  and  wind  speed  when  using  the 
ERA-40  data  set*  Since  the  error  estimates  for  two  variables 
are  not  statistically  significant  if  their  95%  confidence 
intervals  overlap,  we  conclude  from  the  table  that  net  solar 
radiation  is  the  most  important  predictor  and  that  the  other 
variables  are  less  important,  and  they  are  tied  among 
themselves. 

[41]  To  address  question  (b),  we  first  fit  a  GUIDE  model 
using  solar  radiation,  vapor  mixing  ratio,  precipitation,  and 
wind  speed  as  predictor  variables*  Then,  a  separate  GUIDE 
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Figure  1 1 .  Piecewise  linear  GUIDE  regression  tree  model  for  Tair-Tsst  based  on  the  January  data 
from  ERA-40.  At  each  branch,  an  observation  goes  to  the  left  if  the  stated  condition  is  satisfied; 
otherwise  it  goes  to  the  right.  The  number  fin  italics)  beneath  each  leaf  node  is  the  average  value  of 
Tair-Tsst.  The  predictor  variables  arc  wndspd  (wind  speed  at  10  m  above  the  sea  surface  in  m  s'  1 ), 
radflx  (net  solar  radiation  at  the  sea  surface  in  W  m-2),  vapmix  (vapor  mixing  ratio  at  10  m  above  the 
sea  surface  in  kg  kg"1),  and  precip  (precipitation  in  ms  !).  Note  that  tor  this  particular  example,  the 
predictor  picked  as  most  important  for  prediction  of  Tair-Tsst  is  the  vapor  mixing  ratio  at  10  m  above 
the  sea  surface. 


Wind  speed  (m  s  ) 


-2V 


Solar  radiation  (W  m  ) 


Figure  12.  Plots  of  residuals  versus  each  predictor  variable.  The  residuals  are  obtained  from  a  piecewise 
multiple  linear  GUIDE  model  shown  in  Figure  11*  Based  on  the  chi-squared  test,  vapor  mixing  ratio  has 
the  most  significant  curvature,  followed  by  the  net  solar  radiation.  Thus,  vapor  mixing  ratio  and  net 
mixing  ratio  arc  primary  and  secondary  important  variables  for  the  prediction  of  Tair-Tsst.  The  other 
two  variables  (wind  speed  arui  precipitation)  have  relatively  less  curvature,  thereby  less  important. 
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Table  4,  Prediction  of  Mean  Squared  Error  Using  a  Single  Variable  in  GUIDE0 


Deleted  Variable 

Results  for  ERA  40 

Results  For  COADS 

Estimated 

Enor 

Confidence 

Interval 

Statistically 

Significant? 

Estimated 

Error 

Confidence 

Interval 

Statistically 

Significant? 

Wind  speed 

0.56 

(0.54,  0.58) 

No 

0.52 

(0.50,  0.54) 

No 

Solar  radiation 

0.4O 

(0.38,  0.42) 

Yes 

0.37 

(0.35,  0.39) 

Yes 

Mixing  ratio 

0.53 

(0.51, 0.55) 

No 

0.53 

(0.51,  0.55) 

No 

Tair 

0.54 

(0.52,  0.56) 

No 

0.53 

(0.51. 0.55) 

No 

Tssi 

0.55 

(0,53,  0.57) 

No 

0,53 

(0.51,  0.55) 

No 

Precipitation 

0.55 

(0.53.  0.57) 

No 

0,50 

(0,48,  0.52) 

No 

^Cross-validation  estimates  of  mean  squared  error  are  obtained  when  each  variable  is  used  as  a  sole  predictor  of  Tair-  Tsst  in  the 
GUIDE  models.  Estimated  errors  arc  provided  along  with  95%  confidence  intervals  for  each  predictor.  The  results  for  ERA -40  and 
CO  ADS  are  given  separately. 


model  is  fitted  for  each  subset  of  three  variables  that  leaves 
out  one.  The  reason  for  not  using  Tair  and  Tsst  as  predictors 
is  because  Tair-Tsst  is  a  linear  function  of  these  two 
variables.  Thus,  Tair  and  Tsst  would  predict  Tair-Tsst 
perfectly.  Since  the  prediction  erTor  for  a  model  based  on 
a  subset  of  variables  is  likely  to  be  worse  than  that  for  a 
model  based  on  the  whole  set,  we  can  use  the  increase  in 
prediction  error  due  lo  variable  exclusion  to  rank  the 
variables  in  their  effect  on  Tair-Tsst. 

[42]  Table  5  shows  the  results,  where  the  GUIDE 
algorithm  is  used  to  fit  the  models.  Wc  see  that  deletion 
of  net  solar  radiation  produces  the  largest  increase  in 
estimated  prediction  mean  squared  error  of  0.39  (0.38)  for 
ERA-40  (GOADS),  The  variable  with  the  second  largest 
increase  is  wind  speed  (0.35)  for  ERA-40  and  vapor 
mixing  ratio  (0.33)  for  COADS.  From  the  confidence 
intervals  reported  in  die  table,  the  difference  between 
solar  radiation  and  wind  speed  is  statistically  significant 
for  ERA-40  but  the  difference  between  solar  radiation  and 
vapor  mixing  ratio  is  not  statistically  significant.  For  both 
data  sets,  there  is  no  statistical  significance  in  the  diffe¬ 
rences  between  the  other  two  remaining  variables.  Thus, 
ihc  result  for  the  second  most  important  variable  is 
inconclusive.  It  is  safe,  however,  to  say  that  the  most 
important  variable  is  net  solar  radiation,  a  conclusion  that 
matches  our  answer  for  question  (a), 

[4j]  Wc  also  investigate  whether  or  not  the  most 
important  predictors  which  affect  Tair -Tsst  vary  greatly 
by  month.  The  GUIDE  models  for  Tair-Tsst  are  first 
constructed  using  a  single  predictor  variable  for  each 
month.  Estimated  prediction  mean  squared  errors  obtained 
using  both  the  ERA4G  and  COADS  data  sets  reveal  that, 
compared  with  the  other  predictors,  net  solar  radiation  at 
the  sea  surface  is  the  most  important  variable  which 
drives  Tair-Tsst  for  all  months  except  March,  April, 
September,  and  October  (Table  6),  Similarly,  when  three 


of  the  four  predictors  are  used  in  the  GUIDE  model,  the 
exclusion  of  net  solar  radiation  produced  the  largest 
estimated  prediction  mean  squared  error  for  ail  months 
except  January  -  March,  September,  and  October  tor  both 
the  ERA-40  and  COADS  data  sets  (Table  7).  No  other 
predictor  variable  exhibits  the  same  consistent  behavior. 

[44]  Finally,  an  examination  of  important  atmospheric 
variables  which  mainly  regulate  Tair-Tsst  is  performed 
for  two  regions  involving  the  major  oceanic  currents 
systems  (Kuroshio  and  Gulf  Stream)  where  Tair-Tsst 
experiences  a  large  seasonal  cycle,  as  discussed  before 
(see  Figure  2),  Clearly,  these  are  regions  where  adveclion 
by  the  major  current  systems  could  have  a  substantial 
impact  on  Tair-Tsst  [c.g.,  Dong  and  Kelly,  2004],  but  an 
impact  neglected  in  this  study  which  is  focussed  on  the 
response  to  atmospheric  forcing  variables. 

[45]  Similar  to  values  given  in  Table  5,  we  calculate 
prediction  mean  squared  error  for  each  atmospheric  variable 
in  predicting  Tair-Tsst  based  on  the  GUIDE  analysis  in 
both  regions.  The  Kuroshio  region  is  bounded  by  25  °N 
40°N  and  120°E-160°E,  and  the  Gulf  Stream  region  is 
bounded  by  30°N-^45°N  and  40° W— 80°  W. 

[4*]  Confidence  intervals  for  each  atmospheric  variable 
are  given  in  Table  8.  The  most  important  variable  in 
predicting  Tair-Tsst  is  still  net  solar  radiation  at  the  sea 
surface.  This  is  true  in  both  regions  based  on  both  ERA-40 
and  COADS  data  sets.  The  importance  of  the  net  solar 
radiation  is  even  more  significant  when  the  relationship 
between  Tair-Tsst  and  other  variables  is  examined  for  the 
relatively  small  Kuroshio  region  in  comparison  lo  the 
global  ocean.  This  is  because  the  estimated  error  without 
net  solar  radiation  (1.96)  is  more  than  2  times  larger  than 
the  error  (0.78)  for  the  second  most  important  variable, 
mixing  ratio.  This  is  neither  for  the  case  for  the  Gulf 
Stream  region  (Table  8)  nor  for  the  global  ocean  (Table  5). 
We  also  note  that  precipitation  seems  to  be  the  second 


Table  5.  Cross-  Validation  Estimates  of  Prediction  Mean  Squared  Error1 


Deleted  Variable 

Result  for  ERA-40 

Results  for  COADS 

Estimated 

Error 

Confidence 

Interval 

Statistically 

Significant? 

Estimated 

Error 

Confidence 

Interval 

Statistically 

Significant? 

Solar  radial  ion 

0J9 

($37.  0.41) 

Yes 

0J8 

(0.36,  0,40) 

Yes 

Wind  speed 

0.35 

{0.33,  0.37) 

Yes 

0,30 

(Q.2S,  0.32) 

No 

Mixing  ratio 

0.33 

{0.31,0.35) 

No 

0,33 

(0.31,0,35) 

No 

Precipitation 

0.30 

(0.28,  0.32) 

No 

0.32 

(0.30,  0.34) 

No 

“The  estimated  prediction  mean  squared  errors  are  obtained  in  each  case  by  fitting  a  GUIDE  model  to  all  except  the  indicated 
variable  in  column  I .  A  variable  is  considered  statistically  significant  if  its  confidence  interval  docs  not  overlap  with  that  of  the 
variable  with  the  lowest  estimated  error. 
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1'nblc  6,  Estimates  of  Prediction  Mean  Squared  Error  by  Month 
Based  on  a  Single  Predictor4 


ERA-40 

Wind  Speed 

Solar  1 

Radiation 

Mixing 

Ratio 

Precipitation 

Jail 

1.62 

±0.10 

0.99 

± 

0.06 

1.26 

± 

0.06 

1.63 

±0.10 

Feb 

1.16 

±  0.06 

0,79 

± 

0,04 

0.93 

± 

0.05 

11 1 

±  0.06 

Mur 

0.46 

±0,02 

0.45 

± 

0.02 

0.45 

± 

0.03 

0.42 

±  0.02 

Apr 

0.25 

±0.01 

0.24 

± 

0.01 

0.20 

± 

o,m 

0,27 

±0.0  l 

May 

0.42 

±0.01 

0.25 

± 

0.01 

0,39 

± 

0.01 

0.43 

±0.01 

Jun 

0.54 

±0.02 

0.28 

± 

0.01 

057 

± 

0.02 

0.57 

±  0.02 

M 

0.35 

±0.01 

0.25 

± 

0.01 

0.38 

± 

0.01 

0.34 

10.01 

Aug 

0.35 

±0,01 

0.26 

± 

0.01 

0,33 

± 

0.01 

0.32 

±0,01 

Sep 

0.28 

±0.01 

0.25 

± 

0.01 

0.25 

± 

0.01 

0.26 

±0,01 

Oct 

0.31 

±0.01 

0,30 

± 

0.01 

0.29 

± 

0.01 

0.29 

±0.01 

Nov 

0.60 

±  0.03 

0,40 

± 

0.02 

0.58 

± 

0.03 

0.58 

±0.03 

Dec 

1.07 

±0.06 

0.66 

± 

0.04 

0.93 

± 

0.05 

1.06 

±0.06 

COADS 

Wind  Speed 

Solar  Radiation 

Mixing 

Ratio 

Precipitation 

Jan 

0.90 

±0.05 

0.47 

± 

0,02 

0.90 

± 

0,04 

0.92 

±  0.04 

Feb 

0.69 

±0.03 

0.50 

± 

0.02 

0,70 

± 

0.03 

0,72 

±  0  03 

Mar 

0.33 

±  0.01 

0.30 

± 

0.01 

0,32 

± 

0.01 

0.30 

±0,01 

Apr 

0.24 

±0.01 

0.21 

± 

0  Ol 

0.20 

± 

0.01 

0,23 

±0,01 

May 

0.44 

±  0.02 

0,34 

± 

0,03 

0.39 

± 

0.02 

0.41 

±0.02 

Jun 

0.49 

±  0,01 

0,32 

± 

0.01 

0.49 

± 

0.01 

0.48 

±0,01 

Jul 

0.45 

±  0.0 1 

0.30 

± 

0.01 

046 

± 

0.01 

0.42 

±0.01 

Aug 

044 

±0.01 

035 

± 

0.01 

0.42 

± 

0.01 

0.41 

±0.01 

Sep 

0JI 

±0.0  J 

0+2R 

± 

0,01 

0,30 

± 

0.01 

0,28 

±  0.0  L 

Oct 

0.38 

±  0.02 

0.34 

± 

0,01 

0.34 

± 

0.01 

0.36 

±  0.02 

Nov 

0,61 

±0.02 

0,35 

± 

0,01 

0,60 

± 

0,02 

0.56 

±  0.02 

Dec 

0.88 

±0  04 

0.43 

± 

0,02 

089 

± 

0.04 

0.82 

±  0.03 

"Cross-validation  estimates  of  prediction  means  squared  error.  when  a 
single  predictor  variable  is  used  in  the  GUIDE  analysis.  The  values  are 
given  for  ERA -40  and  COADS  data  sets  separately.  Standard  errors  (±)  are 
also  provided.  Each  95%  confidence  interval  is  obtained  by  taking  two 
times  the  standard  error  around  the  estimate. 

most  significant  variable  in  controlling  Tair-Tsst  in  the 
Gulf  Stream,  a  region  where  heat  loss  through  evaporation 
ts  significant, 

6.  Summary  nnc!  Conclusions 

[47]  The  relationship  between  the  pairs  of  Tair  versus  Tsst 
and  Tair  Tsst  versus  four  independent  atmospheric  forcing 
variables  (net  solar  radiation  at  the  sea  surface,  wind  speed, 
vapor  mixing  ratio  at  10  m  above  the  sea  surface,  and 
precipitation  at  the  sea  surface)  is  investigated.  Our  analysis 
is  based  on  global  monthly  mean  climatologies  of  ihese 
variables  from  two  global  data  sets;  ERA-40  and  GOADS. 

[-in]  The  results  dearly  reveal  that  while  there  is  a  strong 
correlation  between  Tair  and  Tsst  over  the  global  ocean,  the 
relationship  between  llte  two  is  not  as  simple  as  can  be 
described  by  a  linear  least  squares  approach.  Flic  reason  is 
that  skill  is  very  low  between  Tair  and  Tsst  due  the  large 
unconditional  bias  (i.c.t  the  bias  due  to  the  large  differences 
between  mean  Tair  and  Tsst)  in  some  regions  (e,g,s  equa¬ 
torial  regions). 

[49]  The  atmospheric  response  to  Tair-Tsst  is  generally  a 
function  of  more  than  one  atmospheric  forcing  variable  in 
many  regions  over  the  global  oceau  on  climatological 
timescales.  Therefore,  a  tree— based  statistical  methodology 
that  allows  for  nonlinear  relationships  between  Tair -Tsst 
and  other  atmospheric  variables  is  used  to  determine  the 
most  important  of  the  variables  considered  in  influencing 
Tair-Tsst.  The  method  fits  piecewise  linear  models  to  Tair- 
Tsst,  Results  using  combined  data  (i.e.f  all  12  months)  from 
ERA-40  and  COADS  shows  that  net  solar  radiation  at  the 
sea  surface  is  the  most  important  predictor  for  Tair -Tsst 


over  the  seasonal  cycle.  The  same  variable  is  also  picked 
when  a  similar  analysis  is  performed  using  data  for  each 
month  separately  In  particular,  net  solar  radiation  at  the  sea 
surface  is  found  to  be  a  crucial  parameter  in  predicting 
Tair-Tsst  for  May  through  August,  November  and  December 
Both  data  sets  (ERA-40  and  COADS)  yield  almost  identical 
results,  reinforcing  the  importance  of  net  solar  radiation  as  a 
predictor  for  Tair-Tsst.  They  also  indicate  the  robustness  of 
the  relationship  between  Tair-Tsst  and  other  atmospheric 
variables.  The  results,  as  revealed  by  the  regression  tree 
models,  point  to  the  importance  of  the  large-scale  environ¬ 
ment  in  influencing  Tair-Tsst.  In  addition,  the  methodology 
presented  in  this  paper  shows  that  regression  trees  can  be 
applied  to  data  with  highly  non-symmetric  distributions 
because  the  models  do  not  require  strong  distributional 
assumptions. 

[so]  The  approach  using  the  statistically- based  GUIDE 
algorithm  should  be  more  effective  in  combination  with 
other  dynamical  ocean  models,  such  as  the  Princeton  Ocean 
Model  (POM)  and  HYbrid  Coordinate  Ocean  Model 
(HYCOM).  In  addition,  all  of  the  analyses  in  this  paper 
arc  based  on  the  assumption  that  Tair-Tsst  is  mainly  driven 
by  local  near- surface  atmospheric  variables.  An  examina¬ 
tion  of  the  effects  of  dynamical  processes,  such  as  oceanic 
upwelling  and  advection  in  the  atmosphere  and  the  ocean  in 
driving  the  seasonal  cycle  of  Tair-Tsst  deserves  a  future 
study.  Such  processes  do  not  assume  one  dimensional 
oceanic  response  to  the  local  atmospheric  forcing. 

Appendix  A:  GUIDE  Algorithm 

[51]  A  brief  description  of  how  the  GUIDE  algorithm 
proceeds  is  provided  here.  While  there  are  other  tree  based 
prediction  algorithms  [e,g,T  Breintan  et  at.  „  1984;  Alexander 


Tabic  7.  Estimated  Increase  in  Prediction  Mean  Squared  Error 
From  a  Multipredictor  Model11 


ERA-40 

Wind  Speed 

Sokr  1 

Radiation 

Mixing 

Ratio 

Preci 

piUition 

Jan 

0.46 

±  0.O3 

0,62 

± 

0.03 

0.73 

± 

0.05 

0.52 

± 

0.03 

Feb 

0,39 

±  0,02 

0.44 

± 

0.03 

0.52 

± 

0.03 

0.48 

± 

004 

Mar 

0.24 

±  0,02 

0,20 

± 

0,01 

0.22 

± 

0.01 

0.28 

± 

0.01 

Apr 

0,13 

±0.01 

0.15 

i 

0.01 

0.15 

± 

0,01 

043 

± 

0.01 

May 

0  14 

±0,01 

0,24 

± 

0.01 

0.16 

± 

0.01 

0.13 

± 

0.01 

Jun 

0.18 

±  o,oi 

0.33 

± 

0,02 

0.18 

± 

0.01 

0.I3 

± 

0,00 

Jul 

0.14 

±  0.00 

0.20 

± 

0.01 

0.15 

± 

0.00 

QJ2 

± 

0.00 

Aug 

012 

±  0.01 

0.17 

± 

0.01 

0.16 

± 

0.00 

0.1J 

± 

0,00 

Sep 

0.17 

±  0.00 

0.18 

± 

0,00 

0,19 

± 

0.00 

0,19 

± 

0.00 

Oct 

0.18 

±  0.01 

0.19 

± 

0.01 

0,20 

± 

0,01 

0.2 1 

± 

0,01 

Nov 

0,23 

±  0.01 

0,33 

± 

0.02 

0,26 

± 

0.01 

0.23 

± 

0.01 

Dec 

0.36 

±  0.02 

0.4K 

± 

0.03 

0,48 

± 

0.02 

0.40 

± 

0  02 

COADS 

Wind 

1  Speed 

Solar  Radiation 

Mixing 

Ratio 

Precipitation 

Jan 

0.33 

±0,02 

0,34 

± 

0.02 

0.26 

± 

0,0 1 

0,38 

± 

0,02 

Feb 

0.30 

±0,02 

030 

± 

0.01 

0.25 

± 

0.01 

034 

± 

0.02 

Mar 

€.16 

±0,01 

0.16 

± 

0,01 

0.17 

± 

0.01 

0.22 

± 

0.01 

Apr 

0.14 

±0.01 

0. 15 

± 

0.01 

0.15 

± 

0.01 

0,14 

± 

0.01 

May 

OIK 

±0.02 

0.29 

± 

0,02 

0,27 

± 

0.02 

0,18 

± 

0.01 

Jun 

0.17 

±  0.01 

0.33 

± 

0.01 

071 

± 

0,01 

0,18 

± 

0.01 

Jul 

0.15 

±  0.00 

0.29 

± 

0.01 

0.20 

± 

001 

0.15 

± 

0.00 

Aug 

0.22 

±  0,01 

0,33 

± 

0,01 

0.27 

± 

0.01 

0.23 

± 

0.01 

Sep 

0.22 

±  0.0 1 

0,24 

± 

0,01 

0.23 

± 

0.01 

0.22 

± 

0.01 

Oct 

0.24 

±0.01 

0.26 

± 

0.01 

0,29 

± 

0,02 

0,24 

± 

0,01 

Nov 

0.23 

±0.0! 

0.4! 

± 

0.02 

0.26 

± 

0.01 

0.23 

± 

0.01 

Dec 

0,32 

±0.02 

0.46 

± 

0.02 

0.29 

± 

0.01 

0.31 

± 

0.01 

‘'Cross- validation  estimates  of  increase  arc  shown  by  month,  using 
GUIDE  with  all  except  one  of  the  predictor  variables. 
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Table  8.  Cross-Validation  Estimates  of  Predictiim  Mean  Squared  Error  for  Kuroshio  and  Gulf  Stream  Region" 


Kuroshio  Region 

Results  for  bRA-tt! 

Results  for  COADS 

Estimated 

Confidence 

Statistically 

Estimated 

Confidence 

Statistically 

Deleted  Variable 

Error 

Interval 

Significant? 

Error 

Interval 

Significant? 

Solar  radiation 

IM 

(1.84,  2.08) 

Yes 

0,91 

(0,85,  0.96) 

Yes 

Mixing  ratio 

0,78 

(0.73,  0,83) 

Yes 

0,32 

(0,32,  0.33) 

Yes 

Wind  speed 

0.57 

(0.53,  0.61) 

Yes 

0,24 

(0.22.  0.25) 

No 

Precipitation 

0,44 

(0.4  L  0.46) 

No 

0.21 

(0.19.  0,22) 

No 

Gulf  Stream  Region 

Rcsulls  for  ERA -40 

Results  for  COADS 

Solar  radiation 

0,87 

(0.83,  0,91) 

Yes 

0.60 

(0.58.  0.03) 

Yes 

Precipitation 

0.73 

(0.70,  0.76) 

Yes 

0,49 

(0.47.  0.52) 

Yes 

Mixing  ratio 

0.62 

(0.59,  0.65) 

No 

0.50 

(0.48,  0.52) 

Yes 

Wind  speed 

0.62 

(0.59*  0.65) 

No 

0.44 

(0.42.  0.46) 

No 

"The  results  arc  shown  as  in  Table  5.  i.e,,  estimated  prediction  mean  squared  errors  Tor  Ihe  deleted  variable  for  each  case.  A 
variable  is  considered  statistically  significant  if  its  confidence  interval  docs  not  overlap  with  that  of  the  variable  with  the  lowest 
estimated  error. 


and  Grimshaw,  1996;  Rip  ley y  1996],  GUIDE,  as  used  in  this 
paper  lias  advantages  over  them  because  of  three  main 
reasons:  (1)  it  has  a  negligible  selection  bias,  (2)  it  includes 
categorical  predictor  variables,  and  (3)  it  is  sensitive  to 
pairwise  interactions  between  regressor  variables. 

[52]  Suppose  that  a  random  observation  (x,  y)  is  genera¬ 
ted  by  the  relation  y  =/(x)  +  where/(x)  is  an  unknown 
Emction,  e  represents  random  variation  with  zero  expecta¬ 
tion,  and  x  may  be  a  vector  of  predictor  variables  x  -  (,vj,  .v2* 
, .  ,*  jta).  In  this  context/ is  called  a  regression  function. 

[53]  Given  a  data  set  {(xi.  ^i).  (*2.  J'2)-  ■  •>  <x„.  yn)}  of  n 
observations,  there  are  many  methods  of  estimating  / 
Clearly,  it  is  desirable  to  choose  an  estimate  /such  that 
the  expected  square  prediction  error  £{y*  —  /{x*)}2  is 
small,  where  (x*,y*)  denotes  a  future  observation  that  is  not 
in  the  data  used  lo  construct  /. 

[54]  If /is  known  to  be  a  linear  function  of  jtj,  x2>  . .  -*  xk, 
the  least  squares  method  often  yields  an  estimate  with 
excellent  expected  square  prediction  error.  But  the  latter 
can  be  very  large  if /is  not  a  linear  function.  In  that  case,  a 
n  on  parametric  method  that  adapts  to  the  complexity  of  /  is 
usually  preferred.  One  such  method  is  GUIDE  which  yields 
a  piecewise  linear  estimate  of /  Besides  being  completely 
automatic  and  adaptive,  GUIDE  has  the  unique  feature  that 
the  data  partitions  defining  the  piecewise  linear  estimate  can 
be  displayed  graphically  as  a  binary  decision  tree. 

[55]  The  algorithm  aims  to  identify  the  predictor  variable 
whose  plot  exhibits  the  highest  degree  of  non- randomness, 
because  this  variable  is  most  likely  to  have  a  nonlinear 
effect  on  the  dependent  variable.  GUIDE  employs  the  chi- 
square  test  [e.g.,  Jaismgh  and  RozakL s,  2000]  to  measure  the 
degree  of  nonlinearity  in  each  predictor  variable.  Specifi¬ 
cally,  it  first  groups  the  predictor  values  into  four  groups  at 
the  sample  qiiarlilcs  and  then  cross -tabulates  the  grouped 
values  with  the  signs  of  the  residuals.  The  predictor  variable 
yielding  the  most  significant  chi-square  statistic  is  selected 
to  split  the  node  with  an  inequality  of  ihe  form  x  <  c,  Each 
value  of  c  divides  ihe  data  into  two  subsets  and  a  multiple 
linear  regression  model  is  fitted  to  the  data  in  each  subset. 
The  value  of  c  that  yields  the  smallest  total  sum  of  squared 
residuals  in  these  two  regression  models  is  selected  as  the 
best  splii  of  ihe  node.  Complete  details  on  the  curvature  test 
and  the  pruning  algorithm  are  given  in  Loh  [2002]. 
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